ED 118^617 ' 005 105 



^OTHOE ^Nicolich^ Mark ^ J. , . ' 

TITLE * ' , LG^Iibfitudiiial Data Analysis With Picturest Begression 

an$.^ Principal Components. . ' 
POB DATE [75] ^ * ' * , 

NOTE 19p. ' ^ y 



EDRS PRICE J!P-$0.83 HC-$1.67 ^us\Postage ' . , ' ^ 

DESCEIPTOHS J^Data Analysirfe; Graphs;. *LongituffinallStudies; . 

HiCLti'ple Regression Analysis; *Stati^tical 
, *a * , Analysis , ^ 

IDENTIFIERS * Change Pbint Analysis; Principal^ Cgmponents ' 
, Anap^ysis \ • - ^ ^ - ' 

^ ' • ft 

ABST?RACT ^ , • 1 ' 

r Se^trerai statistical techniques that cah be us.ed -to 
aaelior'ate the cliTf icalties inherent in the data analysis of 
longitudinal studies ate presented. The first step in longitudinally ^ 
data analysis is .graphing* This permits visual inspect^^^^ of the 
data, and with educated viewing can yield insights into the na'^ure of 
the underlying, mechanisns. The *nex% level df sophistication is to 
ap|»ly regression anal]^sis.i^d change point analysis to .4}he curves 
Obtained from th§ graphical analysis. It is usually the oase in 
lo;igitudinal «studie3^ that the exact form of the curve^ is not known 
prior tg th'e experimentation. The graphing Sf the data is useful i^ 
suggesting different fcathematical models to apply to the curves. Tip 
results of the regression analyses will help determine the uniformity 
'of ^tlie process across sub jects. The next^tep is to use the form of 
the ifitted equatioA to determine significant points on the cutve. The 
shape of the curve will suggest change points in the subjects* 
behavior, with ^respect to the dependent >variabl^. In certain cases 
where problems arise, ' the use of principal components is called for. 
Practical advantages are that they explain ttie original, curve best 1 - 
and will likely point to any existing major differences, and thgy 
occur mathematically and do not depend on the experimenter's ability 
to form a regression curve or pick important change points. When used 
in conjunction with each other, thefee techniques f6rm a powerful 
^package for analyzing longitudinal data. (RC) 



* Documents acquired by ERIC include many informal unpublished* ' ♦ 

* materials npt available from othep sources. ERIC makes ev^ry effort * 
^ tb obtain the best copy available. Nevertheless, items of marginal * 

* reproducibility ar^ often encountered and this affects the quei/lity. * 
Juo^ the microfiche and hardcopy reproductions ERIC makes availlabla * 

* via the ERIC Document Reproduction. Service (EDRS)'. EDRis is npt * 

* responsible for the quality of the original document. Reproductions^ * 

* supplied by EDRS are the*.best that can b§ m'ad^ from the original. * 



ERIC 



. \ 



. U S OEPAHTMENTOF HEALTH, 
EDUCATION A WELFANC 
NATIONAL INSTITUTE OF \ 
EDUCATION . 

THIS DOCUMENT HAS BEEN REPRO- 
OUCEO EXACTLY AS RECEIVED 
THE PERSON OR ORGANIZATION ORIOrw- 
ATING IT* POINTS OF VIEW OR OPINl\DrtS 
STATEO 60 NOT NgCESWfRILY RE^R^ 
SENT OFFICIAL NATIONAL INSTITyTEaf 
EDUCATION POSITION OR POLICY 



Longitudinal. Data Analysis 
with Pictures, Regression, and 
Principal Components^ 



Mark Nicollch, Ph.D. 
Statistician, Mathematics Department 
Rider College, Trenton^ New Jersey 



00 



10 



In a I6ngltijdlnai study, when several variables are Investigated, the^* 

* • - • ) 

dpta analysis Is often perplexing.' The Independent variable Is Invariably 

**tlme, but the appropriate choice of the dependent variable and method of 

analysis l^^not always clear. There are several statistical techniques 

that can be used to ^fnellorate these difficulties, and they range from 
« < 

simple to apply and Interpret- to conceptually complex. The techniques 

are graphing, regression, change point analysis and principal components. 

There aVe additional high-powered, statistical techniques ^that can 

/be'used to analyze longitudinal data: cannonlcal correlation, multivariate 

analysis of variance, multivariate ttme series, and factor analysis with 

• 4. * 

a wide .choice of rotatlqn procedures . While these techniques are 

applicable, their results are often dj^fflcult to Interpret. When dealing 

with lori;ltudlnaf data arid subjects who are ln»a period of rapid deve lopmenii- 
^ a major difficulty Is that the response variables are varying with time, 
' and not alj responses vary at the same rate. This difficulty Is further 

complicated by subject to subject varlabMIty. The beglnnjng of a. 

deve lop^n-tel trend fs often not at the same chronological age for all 



V 



a- 



ERIC 



subjects; In- addition the period to ful Ivdevl^lopment wl I I vary from ^ 

subject to subject. These forms of variability make pnalysl s' dl f f I cu 1 1. 

Tbis paper presents suggestions for ameliorating these problems of 

longitudinal data analysis, where small sair.ples'are used. 

The ^st step In^an analysis of longitudinal data Is to graph the 

data. This permits visual Inspection of the data, and with educated 

vlewli^.can yjeld Insights Into the nature of the underlying mechanisms. 

In almost all types of data analysis^ the visual. Inspection of data. Is 

Important. With longitudinal data the-plcture of the data .Is almost 

always rewarding. . ' ' \ v , 

A series of graphs should be prepared for each subject. For each 

subject a separate 'graph Is made for each dependent variable of Interest. 

All graphs should have the same time scale.to make Interpretations simpler. 

The araphs are then laid owt Irr rectangle wl ttr"slml l^r measures lined up 

Iq one direction, and subjects Itnad up In the other direction. Ifle - 

*'grld of graphs" Is now open for Inspectlorr. ^ . 

^ Firpt look for patterns from subject to subject on a particular 

dependent measure. , Is. the shbpe of the curve the s^me for each. subject? 

Are some displaced along the time axis? Are some elongated or heightened 

« 

relative to the others? 'Some subject^ may have developed eal^ly or late and 
not have full curves, so. the remaining pdJrtlon wl M have tq bfe added 
mentally. If all curves are almost the same then^ere Is. little or no 
subject to subject varlabl II ty on th-e given measure, and tile ;!)henomenon 
can be Investigated uslno all subjects. If there Is a displacement* or 
distortion of the curve with respect to some sabjectsy determine what 



disti nguFshes these subjects^f rom the. others, Ts It^nly one point that 
Is astray, did the subject h^ave "a barf" (day" during that observation, was 
the subject "learnlQg from that* k I'd next doAr"? These reasons mlaht account 
for subject to subject varlabi llty. „ • , • ' 

Next look for patterns In the measure from -subject to subjett. If k 
subject Is relatively' advanced In one measure, doeL It also hold for other 
'measures? Does the pattern of differences coincide with' thd nature of the 
subject? ' " 

Patterns within subjects and patterns from^subject to subject should 

emerge, 'if patterns do not show, then, possibly, the "I nstruments used ^ ^ 

/ ■ ^ \ 

do not measure what they are Intended to (they lack validity), or the 

underlying hypotheses need reoraanl zing, or the pattftr'ns are subtle and|» 

& , * * ' * 

need more sophisticated analysis. «a • ^ 

'As an example, data from a one year longitudinal study was analyzed. 
Five female 'subjects, ranginq In age from 14 to^ months at^the start 
of the $tudy, were observed approximately monthly for a 40 minute period 
for the duration of the study (NIcollch, 1975)- Of the measurements 

'made In the, brlglnal study, cons1^ra+lon will be made of two I Ingulstfc 
variables (type imitation rate, and ^number of wo^rd types) and one non- 
linguistic* variable (level of symbolic play). Figures 'I thnouah 8 represent 
graphs derived from the data. n . " ^ 

Figure I demonstrates several pattern changes for a sinqle response 
variable for the five subjects. Tracy exhlbjlts a shift. In that her • 

* pattern begins re I atl ve ly ' late. MIra ha^ her "steps'^ compressed and ' , 

probably went through the levels between 15 anif 16 months of age, and thd . 

^ steps were missed. ShantI has a relatlVely^elonqated pattern. AIJ ^Ive 



subjects haj/e ^a-slmllar pattern, but sllqht chances or Individualities 
are noted. ' . 

Flgure 2 graphs the type Imitation rate for all five subjects on the \ 
' \ " , 

sanne graph.» It Is difficult to make coirparlsons or conclusions from this 
graph* Five separate graphs, laid out side by, s!d^, yield a better^ picture, 
^nd promote I.n+erpretat(on. See Figures ,4 through. 8. The time axis Is 
the same for the first four subjects but channed for the last In order to 
give .a larger qraph. Note It Is Trefcy, the late starter; who again has 
her pattern shifted to the right, this Ts'also evident fron^ Figure 2. 
Again the pattern -of the five subjects Is similar; a rlqht skewed, unlmodal, 
platykurtlc distribution, (a mbun d tral I Inq off to the rlqht). MIra's 
pattern Is rpst different, and aqain Is compressed. ShantI hes an e'longated 
pattern -^galn. Tl^ beginning ''of Merl 's curv6 Is missing since the study 
tiegan after the start of her development on tKIs variable. The" I "8 month 
ob3jSrvatJan for J an Is Is out of pattern; notes made by the researcher at 
the tlrtB-^f the visit Iqdfcate that she was visited four hours later than 
usupl and was tired and listless. T*|Is Is an Ihdicatlon of the ability . 
of the technique to highlight I rvcons latencies In*the data. 

Flg'^ure 3 Is another example of similarity In subjects on yet .another 
measure (broken Into three sub measures). No sophIstIt:ated statistical 
techniques are requi red\ to ^cw that,^for all subjects studied, the 
spontaneous word types exceed the other cateqorles after the first sessions. 
Other conclusions th^at can be mgtle from the qraph are that the number af 
sponi*aneous word types Increase during the period under study while the 
other categories are relatively constant, the spontaneous- Imitated^ category 



Is less frequent than the' Imitated and that at a time near thy 5th session 
there Is a sharp Increase In the number o-f^spcntaneous word types. 

The ^aphlcal technique can be used as preliminary analysis to more 
sophisticated techniques, to qet a feel for the data and to detect pattern^. 
It can also be a terminal analysis In some cases when the results are strong 
and clear cut. ^ 

/ The next level of sophistication Is to apply regression analysis and 

change point arialysis to the curves obtal ned' f rom the graphical analysis. 
It Is usual ly the case In longitudinal studies that the exact form of the 
curve Is' not known prior to the experimentation. The nraphlng of the data 
Is useful ]n suggesting different mathematical models to apoly to the. curves 
Jri-^Tnost cases several different curves should be fit to the data to find 
a curve whfch Is uniformly acceptable for the data from each subject. If 
there are any subjects that behave differently from the majority, then elthe 
analyze them separately or remove them from further analysis. They should 
receive special attention Vo determine why they are different; are they 
unique to themselves or do they form another phenomena? 

The results of the regression analysis will help determine the 
uniformity of the process across subjects. If the slanlflcance level of 
'the fitted curve* Is similar for aU subjects, then It can be concluded 
that^the phenomenon Is uniform among the subjects. In the example, a 
curve of the form 

' • y = aA^'^' 
was found to fit the curves of the five subjects depicted In Figures 4 
through 8.^ In the equation y Is the type Imitation rate, x Is the age 
In months, e Is the base of the natural logarlthi/ and A, B and C are the^ 



r 



parameters to be estimated In the equation. Tne sfpnfflcance levels 
were .002, .02; /04, .02, and .06 for the regressions fittinq Figures 4 
through 8. The; fit was uniform and good, thus It was concluded that the 
equation adequately modeled the phenomenon In these subjects, ft can 
then be concluded that these f I ve'subjects all-behave similarly with # 
respect tO tf^me and type Imitation rate. 

The fitting of the^ regress I on curves Is to deterrlne uniformity- of 
subject res^ponae. The next step Is to use >he form of* the f itted equation 
to determine slqniffcant points on the curve. The'' shape of the curve 
will sugsiest change points In the subjects behavU>r with respect to the 
depcnderyt variable. The change points might be or^t of the phenorrena,' 
termination of thd phenonnena," peak value, time when a change In behavior 
Is obg/erved, or any other measure sugoested by, theory or curve shape. 
These change points are then I nvesti gatetj as additional, theo.retica I , 
data points. Thjey can be considered to be 6rror free. In that th^ are 
derived from a fitted model and rjot from- d I rect observation . -They are 
raally error free only Insofar as the fitted modql Is error free. 

In the example beinn used, tbe maximum type Imitation rate^was 
considered to be of theoretical Importance; The'msximum, predl ctec^ f rom 
the fitted model, was found for each fitted curve by gs© of 4-he« dif ferenti 
calculus; the maximum li plotted on the graphsJ Thore Is close agreement 
of theoretical and actual maxima In fourv-6f the five curves; again It Is 
MIra who doesn't fol low as wel I. The maximum tends to be near the 
transltloD from play ^evel 3 to play level 4. It also Is near a large 
Increase In the number of single word ^mltatIons (See Figures).* These 
observations are the type that can be made from this kind of analysis. ' • 



-7- 



The practical significance and usef u I ness"^ of the observations jdepend on ^• 
the data at hand. In the example the relationship of the maximum to the 
single word types was useful In explaining a linguistic phenofnenon,- 

In addition to a visual Inspection and- Interpretation of the chan^ 
points, any type of data analysis can be used with the change points as 
data. A t jtest could be used to see If two subgroups of students .^tma I es , 
and ferules for example) differ In their change points. If the subjects 
4iere chosen aecordlna to an Experimental deslga layout, the cfeiange\tol nts 

, " O ft 

could^be analyzed accordinn to the design <jpnf Iguratlon , and the analysis 
would proceed as with any experlm^ntaj design program. Alternatively the. 
change points might be used as a dependent variable In a regression with 
the Independent variab-le some measure on the subject such as age^at the 
start of the^udy. When' dolria this type of analysis care must be taken. 
The usual, caveats of analysis of variance and re^iresslon must be observed, 
and-in addition. If several change points per curve are to be analyzed, 
rocnember they. are not Independent of each other and results fron sepa'-ate 
^ analyses will be correlated. 



i 



'ERIC / 



If the curves from the graphing are unwelldy, or do not lend themselves 
to regression anal,ysls, or the regression results do not yield adeguate- 
change points, or If further analysis of Important points Is desired but 
the Inter dependency Is a problem, then the use of principal' Components Is 
called for* The.fyll technlgue, with a. complete example. Is described by 
Church 0966). , ' . 

^ Principal corrponents analysis would use as Input, data measurements 
made on, the dependent variable ^at chosen time points (the Independent 
variable). The subjects, wou Id he^jje to have been measured at ^-be same point 

8 



-8- 

0 



In time. The time would be either the subjects age tfme or calendar time 
depending on what Is used as the Independent variable. If the dependent 
variable was not measured at the same time for each subject, the value 
of tf^ dependent variable at the required time could be '^approximated by 
Interpolation between adjacent time values. 

Data fpr each subject would -be a column of values of the dependent f 
variable, each value taken at a determined point In time. Yhe principal 
component analysis yields a set of., linear Combinations of these data 
points which "explain'* the cause. of the variability. A linear combination 
Is a weighted average of the dependent varfables taken over time. Each 
linear comblnatlpn has a different wefqhtlnp combination. The "expJanatlon" 
fs the accQuntIng for the variable not having the sama value throughout 
the time period. The wel»:jhtlngs often havfe direct Interpretations and 
have the same Interpretation as the factors In factor analysis. An 
Important fact Is that t^e components are Independent of each other. 

Usually, a subset of the principal components (3 to 5 of^ them) are 
sufficient to ''explain" the majority of the variability. The weightings of 
the principal components can be applied to each subjects dependent 
variable measurements to come up with a set of Individual/ Independent scores 
These scores can b^ treated as change points, except they nay have a more 
complex Interpretation tfian that of a maximum, minimum, etc. These scores 
can then be used In the t test, analysis of variance or regression as 
"previously explained. Their major statistical advantage of principal 
component scored over change points Is that they are Independent of each 
other and a separate an'afysis can be peHormed for each score type 
without regard for the correlation of score l7pAs. Practical advantages 
'are that they are "best at explaining" the original curve and will likely 

erIc . ' , 0 



-9- 



point to major (differences If they exist, and they occur matheriatlcal ly 
and^do not depend on the experimenters ability- to fonr a regression curve 
or pick Important change points. ^ 

No example Is presentedVfor this technique because the da+a used In 
the examples did not require this level of sophistication. The Inspection 
of change points was sufficient to Indicate the underlylnq phanomena. 
Discussions and examples of principal components analysis can be found In 
Coo lev and Lohnes (1971), Morrtson (1967) and Van de Geer (1971). 

The series of techniques desorlDed form a powerful package for 
analyzing lonql tutfJ oa I dota. When used In conjunction with each other they 
can Ir.c;icd4e new approaches and Ideas concerning the data, as well ^ 
verifying predeternil ned hypothes3s. 



\ 



\ 



ERIC 



10 



10- 



" References 

Church, Alonio Jr., Analysis of. data when the response Is a curve. 

Technometrlcl. May 1966, 8(2), 229-246. . ' , > 

Coo ley, W. W., & Lohnes«, P. R . , Multivariate Data Analysis . New York: 

Wl ley and Sons, 1971 . ' 

« 

Morrison, D. F., Mu ltivariates Statistical Methods . New York: McGraw-Hill, 
1967. 



NI CO rich, L. , A Longltudlpal Study of Reoreseotatlbnal Piay In Relation to 
Spontaneous Imitation and Develop me nt of Mult iw ord Utterances . 
Doctoral dissertation, Rutqers University, 1975. 

Van de Geer, J., Introduction to MuJtlvarl ate ^Analysis for the Social 
Sciences . San Francisco: ' W. H. Freeman and Company, 1971. 



7 



J 



11 



/ 



6- 

<2 • 

Z 4- 
< - 
2- 



X 4 

2 



// 




» » ■ 



AGE months 



Figure 1. /Ac^^ at attainment of , symbolic play levels. 



> 



YiPE IMITATION RATE 
^^ - cm' 

A tO • tO 



% 




13 



ERIC 



C 



H 
*< 

H- 
£J 
Ml 
O 

(t 
H- 
O 

Of 

p. 



* , « I TYPE IMITATION RATE 
1 »o . 



, H O) 
-< 
•V 

m 



-C 

o 
o 



m 
(/> 

CO 

o 
z 



o o o 

0 0-. 

o !^ « 01- 
* * W" 



o g « 0>- 



5 yj o> -4- 



« 0> (B CD- 

Ol w , ■ • 



io S <4 s;« 



t 3 w 



m 



r 
m 
< 
m 
r 




15 



C 

U1 



CO 



Hi 
O 



0? 

o 



C4 

(0 



2S2 
m 

H 00 

•< 
u 
m 
(A 



~JH m 

o 

c z 

o 
o 

z 



o ?5 « ro 

a> s oi (>i 

5fi ♦ 4k 

Si fo C3 

u Si! m (Tk' 



5^2 w 

01 



0> ^ 



TYPE IMITATION RATE / 



4^ 

O 




1- 



IG 



ERIC 



TYPE l^5ITAT!0N RATE 



C 

n 



03 



P 
^^ 

Ml 

o 
pi 

O 



2 ij • C 2 



m 



m 

03 



o 
cn 

m 



o Fo - 



-• S w W - 

-aw id 

CI 8 * 1^ 



0> 



S g « <y> "l 




'17 



TYPE IMITATION RATE % . 



o. 



Hi 
O 



3: 

0) 



CO 




18 




FRir 



TYPE IMfTATI0^4 RATE % 



. CO 



O 



ro m o> 




ro !ij >i 



01 1^ ^ Ol 



5J m o>- 
5 >i- 



85 ®- 



5 15 a> 



10 



